- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.8k
[Frontend] Add /classify endpoint #17032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run  Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add  🚀 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments. Can you also add an example script to examples/online_serving to show how to use this endpoint?
| 
 Yes! It's  | 
| 
 Oh, right. Sorry I missed that | 
| 
 Really appreciate the feedback! I’ve implemented your suggestions in the latest commit. I’ve noticed that  | 
| 
 Sure! | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoding_format doesn't make much sense outside of pooling requests. Maybe we should have a separate mixin class for pooling request handlers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoding_formatdoesn't make much sense outside of pooling requests. Maybe we should have a separate mixin class for pooling request handlers?
Yeah. It looks like only serving_pool.py and serving_embedding.py are using it.
serving_embedding.py and serving_classification.py are straightforward to refactor; however, transcription and score each have some twists on the base pattern. I left the other endpoints untouched to avoid a large-scale overhaul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DarkLight1337 The latest commit includes the following changes:
- 
Added specialized contexts like - ClassificationServeContext
- EmbeddingServeContext
 
- 
The OpenAIServingclass has the following to take care of common processing logic:- handle()
- _pipeline()
- _validate_request()
- _prepare_generators()
- _collect_batch()
 
- 
Added two abstract classes including _preprocess()and_build_response, which I also added to the following classes to avoid mypy errors:- serving_chat.py
- serving_completion.py
- serving_tokenization.py
- serving_score.py
- serving_pooling.py
- serving_transcription.py
 
- 
I didn’t add a separate mixin for encoding_formatbecause I found it’s already accessible viactx.request.encoding_format, and I thought creating a mixin would be overkill.
- 
I've introduced RequestTfor request handling. I'm considering addingResponseTas well, and updatingOpenAIServingto classOpenAIServing(Generic[RequestT, ResponseT]). What are your thoughts on this approach? That way, we can do something like the following for all the subclasses:
class OpenAIServingEmbedding(OpenAIServing[EmbeddingRequest, EmbeddingResponse]):
    ...
async def handle(self, ctx: ServeContext[EmbeddingRequest]) -> Union[EmbeddingResponse, ErrorResponse]:
    # Process the embedding request
    return EmbeddingResponse(...)  # or ErrorResponse if there's an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to address these in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of addressing these in a different PR, since this PR is specifically about /classify, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can't do this in this PR then I suggest using a mixin for now until the other serving classes have also been migrated, so that we don't have dead code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, let me do the migration on the rest classes as well! Is there anything else regarding all the endpoints that I need to be aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm aware of. As long as the tests pass it should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337  Sorry for the delay. I was migrating the rest of subclasses and encountered a lot of issues regarding running tests on my local machine (M2 chip). test_chat.py for original implementation fails due to RuntimeError: Server exited unexpectedly but works just fine on a cloud GPU instance. It will take substantial hours than I expected. Given the demand for the endpoint, I will update the current PR by using the mixin approach. I will push the change tomorrow.
| This pull request has merge conflicts that must be resolved before it can be | 
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your time! LGTM if the tests pass
| There is a failing test that's relevant to this PR, please fix it | 
| 
 Ok. I'm on it. | 
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
| 
 Hi @DarkLight1337. Do you know when my PR will get merged? | 
| Starting the merge process now, sorry for the delay! | 
| 
 Thank you! Looks like some tests are failing. I'll fix them! | 
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
FIX #13567
FIX #17415